BPMF using posterior propagation

Downloading the data files

In these examples we use ChEMBL dataset for compound-proteins activities (IC50). The IC50 values and ECFP fingerprints can be downloaded using this smurff function:


In [ ]:
import smurff

ic50_train, ic50_test, ecfp = smurff.load_chembl()

Running SMURFF

Finally we run make a BPMF training trainSession and call run. The run function builds the model and returns the predictions of the test data.


In [ ]:
trainSession = smurff.BPMFSession(
                       Ytrain     = ic50_train,
                       Ytest      = ic50_test,
                       num_latent = 16,
                       burnin     = 40,
                       nsamples   = 20,
                       verbose    = 1,
                       save_freq = 1,)

trainSession.run()

In [ ]:
import numpy as np

predict_session = trainSession.makePredictSession()

In [ ]:
# collect U for all samples
Us = [ s.latents[0] for s in predict_session.samples() ]

# stack them and compute mean
Ustacked = np.stack(Us)
mu = np.mean(Ustacked, axis = 0)

# Compute covariance, first unstack in different way
Uunstacked = np.squeeze(np.split(Ustacked, Ustacked.shape[2], axis = 2))
Ucov = [ np.cov(u, rowvar = False) for u in Uunstacked ]
# restack
Ucovstacked = np.stack(Ucov, axis = 2)
# reshape correctly
Lambda = Ucovstacked.reshape(Ucovstacked.shape[0]*Ucovstacked.shape[1], Ucovstacked.shape[2])

In [ ]:
session2 = smurff.BPMFSession(
                       Ytrain     = ic50_train,
                       Ytest      = ic50_test,
                       num_latent = 16,
                       burnin     = 40,
                       nsamples   = 20,
                       verbose    = 1,
                       save_freq = 1,
                       )
session2.addPropagatedPosterior(0, mu, Lambda)
predictions = session2.run()

In [ ]: